FractalNet: Ultra-Deep Neural Networks without Residuals

نویسندگان

  • Gustav Larsson
  • Michael Maire
  • Gregory Shakhnarovich
چکیده

We introduce a design strategy for neural network macro-architecture based on selfsimilarity. Repeated application of a single expansion rule generates an extremely deep network whose structural layout is precisely a truncated fractal. Such a network contains interacting subpaths of different lengths, but does not include any pass-through connections: every internal signal is transformed by a filter and nonlinearity before being seen by subsequent layers. This property stands in stark contrast to the current approach of explicitly structuring very deep networks so that training is a residual learning problem. Our experiments demonstrate that residual representation is not fundamental to the success of extremely deep convolutional neural networks. A fractal design achieves an error rate of 22.85% on CIFAR-100, matching the state-of-the-art held by residual networks. Fractal networks exhibit intriguing properties beyond their high performance. They can be regarded as a computationally efficient implicit union of subnetworks of every depth. We explore consequences for training, touching upon connection with student-teacher behavior, and, most importantly, demonstrating the ability to extract high-performance fixed-depth subnetworks. To facilitate this latter task, we develop drop-path, a natural extension of dropout, to regularize co-adaptation of subpaths in fractal architectures. With such regularization, fractal networks exhibit an anytime property: shallow subnetworks provide a quick answer, while deeper subnetworks, with higher latency, provide a more accurate answer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Convolutional Neural Network Design Patterns

Recent research in the deep learning field has produced a plethora of new architectures. At the same time, a growing number of groups are applying deep learning to new applications. Some of these groups are likely to be composed of inexperienced deep learning practitioners who are baffled by the dizzying array of architecture choices and therefore opt to use an older architecture (i.e., Alexnet...

متن کامل

SMASH: One-Shot Model Architecture Search through HyperNetworks

Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model’s architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectiv...

متن کامل

Exploring the Depths of Recurrent Neural Networks with Stochastic Residual Learning

Recent advancements in feed-forward convolutional neural network architecture have unlocked the ability to effectively use ultra-deep neural networks with hundreds of layers. However, with a couple exceptions, these advancements have mostly been confined to the world of feed-forward convolutional neural networks for image recognition, and NLP tasks requiring recurrent networks have largely been...

متن کامل

Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks

In spite of advances in object recognition technology, Handwritten Bangla Character Recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even the best existing recognizers do not lead to satisfactory performance for practical applications related to Bangla character recognition and have much lower p...

متن کامل

Deep neural networks for time series prediction with applications in ultra-short-term wind forecasting

The aim of this paper is to present deep neural network architectures and algorithms and explore their use in time series prediction. Existing and novel input variable selection algorithms and deep neural networks are applied for ultra-short-term wind prediction. Since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions, recent research e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1605.07648  شماره 

صفحات  -

تاریخ انتشار 2016